Accelerated Stochastic Gradient Method for Composite Regularization
نویسندگان
چکیده
Regularized risk minimization often involves nonsmooth optimization. This can be particularly challenging when the regularizer is a sum of simpler regularizers, as in the overlapping group lasso. Very recently, this is alleviated by using the proximal average, in which an implicitly nonsmooth function is employed to approximate the composite regularizer. In this paper, we propose a novel extension with accelerated gradient method for stochastic optimization. On both general convex and strongly convex problems, the resultant approximation errors reduce at a faster rate than methods based on stochastic smoothing and ADMM. This is also verified experimentally on a number of synthetic and real-world data sets.
منابع مشابه
Composite Objective Mirror Descent
We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields new analysis and algorithms. We also derive specific instantiations of our method for commonly use...
متن کاملAccelerating Stochastic Composition Optimization
Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic firstorder method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for t...
متن کاملGeometric Descent Method for Convex Composite Minimization
In this paper, we extend the geometric descent method recently proposed by Bubeck, Lee and Singh [5] to solving nonsmooth and strongly convex composite problems. We prove that the resulting algorithm, GeoPG, converges with a linear rate (1− 1/√κ), thus achieves the optimal rate among first-order methods, where κ is the condition number of the problem. Numerical results on linear regression and ...
متن کاملStochastic Proximal Gradient Descent for Nuclear Norm Regularization
In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size m × n. By constructing a low-rank estimate of the gradient, we propose an iterative algorithm based on stochastic proximal gradient descent (SPGD), and take the last iterate of SPGD as the final solution. The ma...
متن کاملCompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method
The adaptive gradient online learning method known as AdaGrad has seen widespread use in the machine learning community in stochastic and adversarial online learning problems and more recently in deep learning methods. The method’s full-matrix incarnation offers much better theoretical guarantees and potentially better empirical performance than its diagonal version; however, this version is co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014